尽管基于计划的序列建模方法在连续控制方面表现出巨大的潜力,但由于高维空间中规划的高度计算复杂性和天生的困难,将它们扩展到高维状态序列仍然是一个开放的挑战。我们提出了轨迹自动编码计划器(TAP),这是一种基于计划的序列建模RL方法,可扩展到高州行动维度。使用状态条件矢量定量的变分自动编码器(VQ-VAE),点击模拟给定当前状态的轨迹的条件分布。当部署为RL代理时,TAP避免在高维连续动作空间中逐步计划,而是通过Beam Search寻找最佳的潜在代码序列。与$ o(d^3)$轨迹变压器的复杂性不同,TAP享受常数$ o(c)$规划有关州行动维度$ d $的计算复杂性。我们的经验评估还表明,随着维度的增长,TAP的表现越来越强。对于具有较高状态和动作维度的ADROIT机器人手动操纵任务,TAP超过了基于模型的方法,包括TT,其边距很大,并且还击败了强大的无模型参与者 - 批评基准。
translated by 谷歌翻译
随着对深度学习民主化的向往,在资源约束设备上实施基于变压器的自然语言处理(NLP)模型的需求越来越大,以实施低延迟和高准确性。现有的BERT修剪方法要求域专家启发手工制作超参数,以在模型大小,延迟和准确性之间取得平衡。在这项工作中,我们提出了AE-Bert,这是一种具有有效评估的自动和高效的BERT修剪框架,以选择“良好”子网络候选(高精度),鉴于整体修剪比率的约束。我们提出的方法不需要人类专家的经验,并且可以在许多NLP任务上取得更好的准确性能。我们关于一般语言理解评估(胶水)基准的实验结果表明,AE-Bert优于Bert $ _ {\ Mathrm {base}} $的最先进的(SOTA)手工制作的修剪方法。在QNLI和RTE上,我们获得75 \%和42.8%的总体修剪比,同时获得更高的精度。在MRPC上,我们的得分比SOTA高4.6,在相同的整体修剪比为0.5。在STS-B上,与SOTA手工制作的修剪方法相比,我们可以达到40 \%的修剪比,而Spearman相关性的损失非常小。实验结果还表明,在模型压缩之后,单个bert $ _ {\ mathrm {base}} $ coder的推理时间在xilinx alveo u200 fpga板上具有1.83 $ \ times $ speedup,与intel(r)xeon相比)Gold 5218(2.30GHz)CPU,它显示了部署BERT $ _ {\ MATHRM {base}} $模型在计算限制设备上生成的方法生成的子网的合理性。
translated by 谷歌翻译
这项工作介绍了一种基于高准确性EMG的手势识别的方法。一种新开发的深度学习方法,即,深层残留的收缩网络用于执行手势识别。基于手势引起的EMG信号的特征,进行了优化以提高识别精度。最后,应用三种不同的算法将EMG信号识别的准确性与DRSN的精度进行比较。结果表明,DRSN在EMG识别准确性方面表现出传统的神经网络。本文提供了一种对EMG信号进行分类以及探索DRSN可能应用的可靠方法。
translated by 谷歌翻译
制定了具有机器学习模拟(骆驼)项目的宇宙学和天体物理学,通过数千名宇宙的流体动力模拟和机器学习将宇宙学与天体物理学结合起来。骆驼包含4,233个宇宙学仿真,2,049个n-body和2,184个最先进的流体动力模拟,在参数空间中采样巨大的体积。在本文中,我们介绍了骆驼公共数据发布,描述了骆驼模拟的特性和由它们产生的各种数据产品,包括光环,次麦,银河系和空隙目录,功率谱,Bispectra,Lyman - $ \ Alpha $光谱,概率分布函数,光环径向轮廓和X射线光子列表。我们还释放了超过骆驼 - 山姆的数十亿个星系的目录:与Santa Cruz半分析模型相结合的大量N身体模拟。我们释放包含350多个Terabytes的所有数据,并包含143,922个快照,数百万光环,星系和摘要统计数据。我们提供有关如何访问,下载,读取和处理数据AT \ URL {https://camels.readthedocs.io}的进一步技术详细信息。
translated by 谷歌翻译
Audio sound recognition and classification is used for many tasks and applications including human voice recognition, music recognition and audio tagging. In this paper we apply Mel Frequency Cepstral Coefficients (MFCC) in combination with a range of machine learning models to identify (Australian) birds from publicly available audio files of their birdsong. We present approaches used for data processing and augmentation and compare the results of various state of the art machine learning models. We achieve an overall accuracy of 91% for the top-5 birds from the 30 selected as the case study. Applying the models to more challenging and diverse audio files comprising 152 bird species, we achieve an accuracy of 58%
translated by 谷歌翻译
我们将图形神经网络训练来自小工具N体模拟的光晕目录的神经网络,以执行宇宙学参数的无现场级别可能的推断。目录包含$ \ Lessim $ 5,000 HAROS带质量$ \ gtrsim 10^{10} 〜h^{ - 1} m_ \ odot $,定期卷为$(25〜H^{ - 1} {\ rm mpc}){\ rm mpc}) ^3 $;目录中的每个光环都具有多种特性,例如位置,质量,速度,浓度和最大圆速度。我们的模型构建为置换,翻译和旋转的不变性,不施加最低限度的规模来提取信息,并能够以平均值来推断$ \ omega _ {\ rm m} $和$ \ sigma_8 $的值$ \ sim6 \%$的相对误差分别使用位置加上速度和位置加上质量。更重要的是,我们发现我们的模型非常强大:他们可以推断出使用数千个N-n-Body模拟的Halo目录进行测试时,使用五个不同的N-进行测试时,在使用Halo目录进行测试时,$ \ omega _ {\ rm m} $和$ \ sigma_8 $身体代码:算盘,Cubep $^3 $ M,Enzo,PKDGrav3和Ramses。令人惊讶的是,经过培训的模型推断$ \ omega _ {\ rm m} $在对数千个最先进的骆驼水力动力模拟进行测试时也可以使用,该模拟使用四个不同的代码和子网格物理实现。使用诸如浓度和最大循环速度之类的光环特性允许我们的模型提取更多信息,而牺牲了模型的鲁棒性。这可能会发生,因为不同的N体代码不会在与这些参数相对应的相关尺度上收敛。
translated by 谷歌翻译
In this paper, we propose a robust 3D detector, named Cross Modal Transformer (CMT), for end-to-end 3D multi-modal detection. Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. The spatial alignment of multi-modal tokens is performed implicitly, by encoding the 3D points into multi-modal features. The core design of CMT is quite simple while its performance is impressive. CMT obtains 73.0% NDS on nuScenes benchmark. Moreover, CMT has a strong robustness even if the LiDAR is missing. Code will be released at https://github.com/junjie18/CMT.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Automatic music generation with artificial intelligence typically requires a large amount of data which is hard to obtain for many less common genres and musical instruments. To tackle this issue, we present ongoing work and preliminary findings on the possibility for deep models to transfer knowledge from language to music, by finetuning large language models pre-trained on a massive text corpus on only hundreds of MIDI files of drum performances. We show that by doing so, one of the largest, state-of-the-art models (GPT3) is capable of generating reasonable drum grooves, while models that are not pre-trained (Transformer) shows no such ability beyond naive repetition. Evaluating generated music is a challenging task, more so is evaluating drum grooves with little precedence in literature. Hence, we propose a tailored structural evaluation method and analyze drum grooves produced by GPT3 compared to those played by human professionals, exposing the strengths and weaknesses of such generation by language-to-music transfer. Our findings suggest that language-to-music transfer learning with large language models is viable and promising.
translated by 谷歌翻译
Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.
translated by 谷歌翻译